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Fixed or adaptive deinterleaved transform coding for image coding and intra coding of video 



A coding strategy efficiently codes intra (mac- 
roblocks, regions, pictures, VOP) data. This strategy us- 
es two basis approaches, a Fixed DeinterleavedTrans- 
form Coding approach, and an Adaptive Deinterleaved 
Transform Coding approach. Furthermore, within each 



approach, two types of coders are developed. One cod- 
er operates on an entire picture or VOPs and the other 
coder operates on small local regions. Using coders and 
decoders of the present invention, efficient coding at a 
range of complexities becomes possible, allowing suit- 
able tradeoffs for a variety of applications. 
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Description 

BACKGROUND OF THE INVENTION 

This patent applicaiton is based on provisional ap- 
plication 60/027,436 filed on September 25, 1996. 

The present invention relates generally to methods 
and apparatuses for coding images and intra coding of 
video and more particularly to a method and apparatus 
for coding images and intra coding of video using trans- 
form coding. Intra coding is important for applications 
requiring simple encoding/decoding, low delay, a high 
degree of error robustness or a high level of interactivity. 
Examples of such applications include image/video on 
the Internet, wireless video, networked video games, 
etc. 

The state of the art standards in image coding and 
intra coding of video employ transform coding (e.g., Dis- 
crete Cosine Transformation, DCT), which involves par- 
titioning an image into non -overlapping blocks -of 8x8 
size and coding in units of a 2x2 array of luminance (Y) 
blocks and a corresponding chrominance block of Cr 
and a block of Cb signal (together referred to as a mac- 
robiock). Improvements in performance of intra coding 
have been obtained by predicting the DC coefficient of 
DCT blocks by using previously reconstructed DC coef- 
ficients. Recently, further improvements have been ob- 
tained in MPEG-4 by predicting AC coefficients as well. 

Over the past several years, many researchers 
have followed a different approach that uses wavelets 
instead of transform coding. These researchers have re- 
ported substantial improvements, at the expense of in- 
creased complexity. Recently, variations of wavelet cod- 
ing that use subsampling prior to coding have emerged 
(some of which are currently being experimented with 
in MPEG-4), that provide an even higher performance 
by using advanced quantization techniques that take 
advantage of subsampling, however, the complexity of 
such schemes is extremely high. 

The present invention is therefore directed to the 
problem of developing a method and apparatus for cod- 
ing images and video that has a high coding efficiency 
yet relatively low coding complexity. 

SUMMARY OF THE INVENTION 

The present invention solves this problem by using 
a deinterleaving step prior to the transformation step in 
combination with a suitable quantization technique. 
Deinterleaving is a more flexible form of subsampling. 
The method and apparatus of the present invention thus 
achieve the coding efficiency of wavelet coding, at the 
expense of only a small improvement in complexity over 
traditional transform coding but at a significantly re- 
duced complexity relative to wavelet coding. 

The present invention includes two forms of dein- 
terleaving in the encoding process. According to the 
present invention, improved intra coding is achieved by 



fixed or adaptive deinterleaving, transform (e.g., DCT), 
quantization with extensions and entropy encoding, (e. 
g., Variable Length Encoding, VLE, or Arithmetic Encod- 
ing, AE). According to the present invention, there are 
5 two main approaches to coding. The first uses fixed 
deinterleaving, and the second uses adaptive deinter- 
leaving. it now becomes possible as a result of the 
present invention to use a simple transform coding 
method that achieves high coding efficiency using a 
io fixed deinterleaving approach, or to use a slightly more 
complex method that produces somewhat better results 
based on an adaptive deinterleaving approach, and yet 
both of these approaches are less complex than wavelet 
coding and achieving the same or nearly the same cod- 
's ing efficiency. 

According to advantageous implementations of the 
present invention, for each approach, two variations ex- 
ist, one intended for separate motion/texture part of the 
MPEG-4 Verification Model (VM) (global transform) cod- 
20 ing and the other intended for combined motion/texture 
part of the VM (local transform) coding. 

BRIEF DESCRIPTION OF THE DRAWINGS 

25 FIG 1 depicts the basic encoder block diagram for 
Global Deinterleaved Transform (GDT) coding accord- 
ing to the present invention. 

FIG 2 shows the corresponding decoder for the en- 
coder shown in FIG 1 according to the present invention. 

30 FIG 3 depicts the basic encoder block diagram of 
Local Deinterleaved Transform (LDT) coding according 
to the present invention. 

FIG 4 shows the corresponding decoder for the en- 
coder shown in FIG 3 according to the present invention. 

35 FIG 5 shows a simple example of deinterleaving a 
region by a factor of 2 both in the horizontal and vertical 
direction to generate subregions. 

FIG 6 shows an 8x8 array of subpictures, each 22x1 8 
in size that results from application of 8:1 deinterleaving 

40 in horizontal and vertical directions to the luminance sig- 
nal in GDT coding for (QCIF) resolu- 
tion. 

FIG 7 shows a 4x4 array of subpictures each 8x8 in 
size that result from application of 4:1 deinterleaving in 
45 horizontal and vertical directions to 32x32 regions of lu- 
minance signal in LDT coding for QCIF resolution. 

FIG 8 depicts one method of extended quantization, 
QuantX Method 1 , according to the present invention. 

FIG 9 shows the inverse operation of the extended 
so quantization method shown in FIG 8, QuantX Method 1 , 
according to the present invention. 

FIG depicts an example of quantized DC coeffi- 
cients prediction used in the present invention. 

FIG 11 shows an example of AC coefficient predic- 
ts tion structure employed in the present invention. 

FIG 1 2 depicts another method of extended quan- 
tization, QuantX Method 2, according to the present in- 
vention. 
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FIG 1 3 shows the inverse operation of the extended 
quantization method shown in FIG 12, QuantX Method 

2, according to the present invention. 

FIG 14 depicts another method of extended quan- 
tization, QuantX Method 3, according to the present in- 
vention. 

FIG 1 5 shows the inverse operation of the extended 
quantization method shown in FIG 14, QuantX Method 

3, according to the present invention. 

FIG 1 6 shows the block diagram of an Adaptive Glo- 
bal De interleaved Transform (AGDT) encoder 150 used 
in the present invention. 

FIG 17 shows the block diagram of the AGDT de- 
coder used in the present invention, which corresponds 
to the AGDT encoder shown in FIG 16. 

FIG 18 shows a block diagram of an Adaptive Local 
De interleaved Transform (ALDT) encoder used in the 
present invention. 

FIG 19 shows the ALDT decoder used in the 
present invention, which corresponds to the encoder of 
FIG 17. 

FIG 20 shows an example of quadtree segmenta- 
tion employed in the present invention. 

DETAILED DESCRIPTION 

According to the present invention, a significant im- 
provement in efficiency of intra coding within the frame- 
work of transform coding is now possible. The present 
invention has been designed for use with MPEG-4. Of- 
ten in MPEG-1/2 video coding, intra coding is employed 
for pictures coded by themselves, which are called I -pic- 
tures, or for intra macroblocks in predictively coded pic- 
tures (P-pictures or B-pictures). Besides pictures or 
macroblocks, MPEG-4 also introduces the concept of 
Video Objects (VO) and Video Object Planes (VOPs). 

In MPEG-4 coding a scene can be partitioned into 
a number of Video Objects each of which can be coded 
independently. A VOP is a snapshot in time of a Video 
Object. In fact, a picture then becomes a special case 
of a VOP that is rectangular in shape. 

MPEG-4 coding also includes coding of VOPs of dif- 
ferent types, such as i-VOPs, P-VOPs, and B-VOPs, 
which are generalizations of I -pictures, P-pictures, and 
B-pictures, respectively. Thus, in addition to coding of I- 
pictures, and intra macroblocks the present invention 
can also be used for coding of l-VOPs, both rectangular 
and arbitrary in shape. 

The main functionality addressed by the present in- 
vention is a coding efficiency of I-VOPs, although this 
approach is extendable to coding of P-VOPs and B- 
VOPs. An additional functionality that could be indirectly 
derived from the present invention is spatial scalability. 

The present invention achieves a significant im- 
provement in intra coding efficiency (by a factor of 1.5 
or more) and can be obtained while still employing the 
DCT coding framework by requiring the addition of dein- 
terleaving and extended quantization. In general, intra 



coding efficiency can also be somewhat improved by 
further improving DC coefficient prediction, incorporat- 
ing AC coefficient predictions and scanning adapta- 
tions, however, even after combining all these tech- 
s niques the improvements may be relatively small as 
compared to the potential improvement of the present 
invention. 

For I-VOPs coding in MPEG-4, FIG 1 depicts the 
basic encoder 1 0 block diagram of Global Deinterleaved 

io Transform (GDT) coding according to the present inven- 
tion. At the input to the deinterleaver 11 , the image is in 
pixel format with each pixel being represented by the 
three components of luminance, chrominance and sat- 
uration, which are digital values. These digital values 

is are fed into the deinterleaver 11 , which separates con- 
tiguous samples within the image. In other words, the 
deinterleaver 1 1 separates the pixels sets into a number 
of pixel subsets, but does so by creating subsets from 
non-contiguous pixels. This then requires specification 

20 of the separation pattern used. Each subset contains 
several of the digital samples, however, the samples 
within a given subset were not contiguous with each oth- 
er in the original image. 

The transform ope ration 1 2 converts the digital pixel 

2S values within each subset into transform coefficients, 
such that most of the energy is packed in a few coeffi- 
cients. In this step, for example, Discrete Cosine Trans- 
form (DCT) can be used. Other known transform tech- 
niques can also be used, such as . 

30 The transformer 1 2 receives the pixel subsets from 
the deinterleaver 11. Each subset contains several pix- 
els, with each pixel being represented by the three val- 
ues of chrominance, luminance and saturation, or some 
equivalent color system. The transformer 12 then out- 

35 puts coefficients representing the spatial frequency 
components of the values within each subset. While 
there is no real compression at this point, the DCT trans- 
form groups the data that enables the latter processes 
to significantly reduce the data. The DCT transform de- 

40 fines most of the information within the subset in the low- 
er spatial frequencies and many of the higher spatial fre- 
quencies come out to be zero, resulting in compression 
later on. The output of the transformer 1 2 then is a block 
of coefficients, one block for each subset created by the 

45 deinterleaving process 1 1 . 

The QuantX process 1 3 includes normal quantiza- 
tion plus some extensions to improve coding efficiency 
and prepares the data for entropy encoding 14. This will 
be described in detail below, as three different QuantX 

50 processes 13 are presented herein. The output of the 
QuantX process 1 3 is a block of bits, one for each subset 
created in the deinterleaving process 11. 

Following the QuantX process 13 is the encoding 
process 14. In this case entropy encoding is used. Any 

55 form of entropy encoding will suffice. Variable Length 
Encoding (VLE) is one example of entropy encoding. 
Arithmetic Encoding (AE) is another example. The cod- 
ed bitstream generated by the Entropy Encoder 14 can 
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now be stored or transmitted. 

Other known entropy encoding can be used. 

FIG 2 shows the decoder 20 corresponding to the 
encoder 10 shown in FIG 1. The decoder 10 for Global 
Deinterleaved DCT Encoding of l-VOPs includes an en- 
tropy decoder 21 , an inverse quantization 22, an inverse 
transform 23 and a re interleave r 24. The entropy decod- 
er 21 inverts codewords back to coefficient data. The 
inverse quantization 22 performs the inverse operation 
of the quantization 1 3 plus some extensions performed 
in the encoder 10. The inverse transform 23 performs 
the inverse operation of the transform 12, and the rein- 
terleaver 24 performs the inverse of the deinterleaver 
11. 

The coded bitstream is fed into the entropy decoder 
21 . The entropy decoder 21 outputs a block of data sim- 
ilar to that input to the entropy encoder 14. Due to the 
deinterleaving 11 performed on the coding side, this 
block of data is subgrouped into blocks corresponding 
to the subsets created by the deinterleaver 11. The en- 
tropy decoder 21 outputs the block of data to the Inverse 
QuantX 22. 

The Inverse QuantX 22 will be described in detail 
below, as three different processes are presented here- 
in, depending upon the coding process performed. The 
Inverse QuantX 22 feeds its output to the inverse trans- 
form 23. The output of the Inverse QuantX 22 is a block 
of coefficients, which can be further subgrouped accord- 
ing to the subsets created by the deinterleaving process 
11. 

The inverse transform 23 then performs the inverse 
transform operation on each subblock of coefficients to 
convert it to a subblock of pixel subsets. These sub- 
blocks are now ready for reinterleaving. The reinterleav- 
er 24 then reconstructs the original order in which the 
pixels appeared. 

FIG 3 depicts the basic encoder 30 block diagram 
of LDT coding. The main difference with respect to FIG 
1 is that prior to deinterleaving an input VOP or picture 
is segmented into local regions. These regions can be 
either square (blocks) or even arbitrary in shape. With 
such segmentation, some of the coding details related 
to transform size and QuantX may change. 

The image in pixel format is fed into the local regions 
segmenter 31 , which outputs the segmented image sig- 
nal to the deinterleaver 32. In this case, the local regions 
segmenter 31 creates subsets of pixels, which are con- 
tiguous. Then, in the deinterleaving step 32, these sub- 
sets are further partitioned so that the resulting parti- 
tioned subsets each contain non-contiguous pixels. 

As in FIG 1 , the remaining process is the same. The 
deinterleaver 32 passes its output to the transformer 33, 
which in turn feeds the QuantX 34, which feeds the en- 
tropy encoder 35, which outputs the coded bitstream. 

FIG 4 shows the corresponding decoder 40 for the 
encoder 30 shown in FIG 3. The main difference with 
respect to the GDT decoder shown in FIG 2 is the addi- 
tion of the local regions assembler 45 (e.g., block unfor- 



matted at the end of the decoding process. The local 
regions assembler 45 performs the inverse operation of 
the local regions segmenter 31 . The LDT decoder 40 
includes the entropy decoder 41, the inverse quantiza- 

5 tion 42, the inverse transform 43 the' reinterleaver 44, 
and the local regions assembler 45. 

The process in FIG 4 is identical to the process in 
FIG 2, except that each step in the process is being per- 
formed on a partitioned subset relative to FIG 2. For ex- 

10 ample, the encoded bits are fed into the entropy decoder 
41 . These bits are ordered by the local regions created 
in the encoding process 30. Thus, each step is per- 
formed on each local region group separately. The de- 
coder 41 then outputs groups of blocks of bits to the 

15 QuantX 42, which then creates the groups of blocks of 
coefficients necessary for the inverse transform process 
43. The inverse transform process 43 then outputs the 
groups of pixels to the reinterleaver 44, which reinter- 
leaves the pixels within each group. Output from the re- 

20 interleaver 44 is therefore the local regions created by 
the local region segmenter. One can view the decoding 
process 40 in this case as being performed on each local 
region separately. At the end of this decoding process 
40, the local regions are then assembled by the local 

25 regions assembler 45 to reconstruct the pixel image as 
it was presented to the local regions segmented 31 . 

We shall now describe some of the steps within the 
process in more detail. These include deinterleaving, 
and QuantX. 

30 

Deinterleaving 

Deinterleaving is the process of separating an input 
picture (or a region) into subpictures (or subregions) 
35 such that the neighboring samples in the input picture 
(or region) are assigned to different subpictures (or su- 
bregions). The resulting subregions or subpictures thus 
contain samples that were not contiguous in the original 
picture. 

40 FIG 5 shows a simple example of deinterleaving a 
region by a factor of 2 both in the horizontal and vertical 
direction to generate subregions. The original picture 51 
being composed of pixels (o, x, +, -) is deinterleaved into 
four subpictures 52-55. Every other element of the first 

45 row (o, x, o, x, o, x) is then assigned to the first row of 
subpicture 51 (o, o, o) and the first row of the second 
subpicture 52 (x, x, x). The same is true for the remaining 
odd rows. The even rows are assigned to the third and 
fourth subpictures (+, +, +) and (-, -, -), respectively, and 

50 split as before. Essentially, each pixel (p f j ) is assigned 
to the subpictur kftJ where k=mod(i/n) and m=mod(j/n) 
and becomes element p r0 in that subpicture, where r= 
(i-k)/n and s=(j-m)/n. 

For example, if we let where n=2, as in FIG 5, we 

55 note that element 56 (i.e., p 2 3), is assigned to subpicture 
01 (element 53), that is, k=mod(2/2)=0 and m=mod(3/2) 
=1. If we examine subpicture 01 (element 53), we note 
that element 57 appears as pixel 11 in that subpicture, 
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and r=(i-k)/n=(2-0)/2=1 and s=(j-m)/n=(3-1)/2=1 . 

In this example, in GDT coding, the deinterleaving 
factor is fixed to 8:1 for QCIF input resolution (176x144). 
For this resolution, FIG 6 shows an 8x8 array, 63, of sub- 
pictures, each subpicture 62 being 22x18 in size that 
results from application of 8:1 deinterleaving in horizon- 
tal and vertical directions to the luminance signal. Also, 
each of the chrominance components are deinterleaved 
by a factor of 4:1 resulting in a 4x4 array of subpictures, 
each of size 22x18. 

On the other hand, in LDT coding, the deinterleav- 
ing factor is fixed to 4:1 for QCIF input resolution. FIG 7 
shows a 4x4 array, 73, of subpictures each subpicture 
72 being 8x8 in size that results from application of 4:1 
deinterleaving in horizontal and vertical directions to 
32x32 regions of luminance signal. In this case, the 
chrominance components are deinterleaved by a factor 
of 2:1 resulting in a 2x2 array of subregions, each 8x8 
in size. 

OCT 

Two dimensional DCT is applied to deinterleaved 
subpictures or subregions. In GDT coding at QCIF res- 
olution, the size of DCT is chosen to be 22x18 both for 
the luminance as well as the chrominance components. 
In LDT coding at QCIF resolution, the size of DCT is 
chosen to be 8x8 both for the luminance as well as the 
chrominance components. 

QuantX Choices 

The normal scalar quantization needs to be modi- 
fied to take into account the fact that the transform cod- 
ing is performed on deinterleaved data. Beyond quanti- 
zation, the coefficient prediction of experiment may also 
be more effective in increasing coding efficiency due to 
higher correlation between coefficients of deinterleaved 
adjacent subpictures (subregions). Another approach is 
to exploit this correlation by forming vectors of coeffi- 
cients of the same spectral frequency and performing 
DCT coding on such vectors (blocks). Finally, yet anoth- 
er alternative is to use vector quantization or a specific 
variation called Lattice Vector Quantization (LVQ) being 
examined in MPEG-4. These various approaches are 
referred to here as QuantX and offer different tradeoffs 
in performance versus complexity and the right one may 
be selected based on the application. 

QuantX Method 1 : Quantization and DCT Coefficient 
Predictions 

This method is explained by reference to FIG 8. The 
signal input to the QuantX 80 is received by a quantizer 
81 , whose output signal is split. One path feeds a DC & 
AC Coefficient Predictor 82, and another path feeds one 
input of a subtractor 83. The output of the DC & AC Co- 
efficient Predictor 82 feeds the other input of the sub- 



8 

tractor 83. The output of the DC & AC Coefficient Pre- 
dictor 82 is subtracted from the output of the quantizer 
81 and fed into a scanner 84, such as a zigzag scanner 
In GDT coding, the DCT coefficient subpictures of 

5 size 22x18 are quantized by the normal scalar quanti- 
zation and then the coefficient subpictures are predicted 
based on previously quantized coefficient subpictures 
and coefficient difference subpictures are formed. In 
LDT coding, a very similar operation takes place on DCT 

10 coefficient subregions of size 8x8. The difference coef- 
ficients are scanned (e.g. , zigzag scanned) to form (run, 
level) events. 

FIG 9 shows the inverse operation of the QuantX 
shown in FIG 8. The signal input to the Inverse QuantX 

15 90 is received by the Inverse Scanner 91 . The output of 
the inverse scanner 91 is fed to one input of an adder 
92. The second input of the adder 92 is from the output 
of a DC & AC Coefficient Predictor 93, which receives 
its input from the output of the adder 92. The output of 

20 the adder 92 is also passed to the inverse quantizer 94, 
which outputs the desired signal. 

A scheme for quantized DC coefficients prediction 
is illustrated in FIG 1 0. In GDT coding, the DC prediction 
of leftmost subregion (subpicture) is selected to 128 

25 while in LDT coding, using 1 bit overhead, selection be- 
tween DC values of horizontally or vertically adjacent 
subregions (subpictures) is made. For the remaining su- 
bregions (subpictures) in the first row, DC prediction us- 
es previous subregion (subpicture) DC value. For the 

30 first subregion (subpicture) of the second row, DC pre- 
diction is taken from the subregion (subpicture) above; 
all the other subregions (subpictures) of that row use 
Graham's predictor adaptively by selecting from hori- 
zontal and vertical adjacent subregion (subpicture) DC 

35 values without overhead. The prediction process for the 
second row is repeated for subsequent rows. 

We now discuss how AC coefficient predictions are 
made. FIG 11 shows an example of AC coefficient pre- 
diction structure employed. In the case of LDT coding 
with 8x8 subregions, 2 rows and 2 columns of AC coef- 
ficient predictions for a subregion may be used. In case 
of larger subregion sizes in LDT coding or larger pictures 
in GDT coding, more AC coefficients may be predicted; 
the number and the structure of coefficients being pre- 

45 dieted can be different but the basic principle of predic- 
tion remains the same. 

For the left-most subregion (subpicture) of a region 
(picture), AC coefficient predictions are reset to 0. For 
the subsequent subregions (subpictures) in the first row 

50 of subregions, the L-shaped highlighted area (without 
DC coefficient) is predicted from subregions (subpic- 
ture). For the first subregion of second row of subre- 
gions, the same L-shaped area is predicted from the su- 
bregion immediately above. Subsequent subregions of 

55 second row are predicted by using first two columns of 
coefficients from previous subregion and the first two 
rows from the subregion above. There is an overlap of 
1 coefficient (AC 1 1 ), which is resolved by averaging the 
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two prediction choices for this coefficient to generate a 
single prediction coefficient. The prediction process for 
the second row is repeated for subsequent rows. 

Further, the potential of making the prediction proc- 
ess adaptive (with or without overhead) is also possible. 

The difference coefficient subpictures of size 22x1 8 
in GDT coding and subblocks of size 8x8 in LDT coding 
are zigzag scanned to form (run, level) events. 

QuantX Method 2: Quantization and DCT of DCT 
Coefficient Vectors 

FIG 12 illustrates the operations employed in this 
method of QuantX. In GDT coding of QCIF pictures, the 
DCT coefficient subpictures of size 22x18 are prequan- 
tized by a small quantization level (Qp= 2 or 3) to reduce 
their dynamic range and then vectors (of 8x8 size for 
luminance and 4x4 size for chrominance) are generated 
by collecting all coefficients of the same frequency 
through all of the subpictures, DCT and quantized. In 
LDT coding with regions of size 32x32, a very similar 
operation takes place resulting in coefficient vectors (of 
4x4 size for luminance and 2x2 size for chrominance); 
these vectors are DCT and quantized. In GDT coding, 
quantized DCT coefficient vectors of 8x8 size for lumi- 
nance and 4x4 size for chrominance are zigzag scanned 
to form (run, level) events. In LDT coding, quantized 
DCT coefficient vectors of 4x4 size for luminance and 
2x2 size for chrominance are zigzag scanned for form 
(run, level) events. 

Referring to FIG 1 2, the QuantX 1 20 includes a pre- 
quantizer 121 , a vector formatter 1 22, a transform 1 23, 
a quantizer 124 and a scanner 125. The prequantizer 
121 receives the signal input to the QuantX 120, and 
outputs its signal to the vector formatter 1 22. The vector 
formatter feeds its output to the transform 123, which in 
turn feeds the quantizer 124, which feeds the scanner 
125. The scanner outputs its signal as the output of the 
QuantX 120. 

The Inverse Operation 130 of the QuantX 120 
shown in FIG 12 is shown in FIG 13. The input to the 
inverse QuantX 1 30 is fed to the inverse scan 1 31 , which 
feeds the inverse quantizer 132, which in turn feeds the 
inverse transform 133. The vector unformatter 134 re- 
ceives the output from the inverse transform 133 and 
outputs its signal to the inverse prequantizer 1 35, whose 
output represents the output of the inverse QuantX 1 30. 

QuantX Method 3: Lattice Vector Quantization of 
DCT Coefficient Vectors 

FIG 14 illustrates the operations employed by this 
method of QuantX. The signal input to the QuantX 140 
is received by the Vector Formatter 141, which passes 
its output to the Dimension Reducer 142, which in turn 
feeds its output to the Vector Quantizer 143. The Vector 
Quantizer 143 then passes its output to the Vector 
Quantization Indices Orde re r 144, whose output repre- 



sents the output of the QuantX 140. 

In GDT coding of QCIF pictures, using DCT coeffi- 
cient subpictures of size 22x18, vectors (of 8x8 size for 
luminance and 4x4 size for chrominance) are generated 

s by collecting all coefficients of the same frequency 
through all of the subpictures; these vectors are quan- 
tized by the LVQ. In LDT coding of region size 32x32, a 
very similar operation takes place resulting in coefficient 
vectors (of 4x4 size for luminance and 2x2 size for 

io chrominance); these vectors are also quantized by LVQ. 
Since VQ often requires small blocks for manageable 
size codebooks (or in LVQ, a manageable complexity), 
a reduction of vector dimension may be necessary and 
is accomplished in Dimension Reducer, which can be 

is as simple an operation as dividing a vector of coeffi- 
cients into sub-vectors or something more sophisticat- 
ed. The process of LVQ is not described herein and is 
discussed in literature. Briefly, however, first LVQ of di- 
mension 16 is tried, if it produces errors higher than a 

20 threshold then the LVQ of dimension 4 is tried. Also after 
LVQ, the LVQ indices of the entire picture or region may 
be ordered for increased efficiency, this process takes 
place in VQ Indices Orderer. 

The Inverse Operation 150 of the QuantX 140 

25 shown in FIG 14 is shown in FIG 15. The input to the 
inverse QuantX 1 50 is fed to the Vector Quantization 
Indices Reorderer 151, which feeds the inverse vector 
quantizer 1 52, which in turn feeds the dimension nor- 
malizer 153. The vector unformatter 154 receives the 

30 output from the dimension normalizer 1 53 and outputs 
its signal as the output of the inverse QuantX 150. As in 
the QuantX 1 40, in the Inverse QuantX 1 50, first LVQ of 
16 is tried. If it produces errors higher than a threshold, 
then the LVQ of dimension 4 is tried. The specification 

35 of LVQ is the same as that in experiment T. 5 in the 
MPEG 4 trials. 

Entropy Coding 

to We now discuss a VL coding and decoding method 
for coefficient (run, level) events, which are coded ex- 
ploiting statistical variations to achieve even further ef- 
ficiency. 

In GDT coding if extended QuantX Method 1 is em- 
45 ployed, a maximum run of 396 is possible and a level of 
at least ±255 needs to be supported. For coding of lu- 
minance run/level events, the intra VLC table of U.S. 
Patent Application No. 08/###,###, entitled "Adaptive 
and Predictive Coding for Image Coding and Intra Cod- 
50 jng of Video 0 by Puri, Schmidt and Haskell is employed. 
U.S. Patent Application No. 08/###,### is hereby incor- 
porated by reference as if recited herein in its entirety. 
However, since this table supports only a maximum run 
of 64 and level of ±128 (same as the MPEG-4 VM) it is 
55 extended to outside of th is region by appending one ex- 
tra bit for level and three extra bits for run and thus uses 
up to 25 bits. For coding of chrominance run/level 
events, the VLC table used is the one in the VM extend- 
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ed to support a maximum run of 396 and level of ±255 
by escaping outside of the currently supported region 
by appending one extra bit for level and three extra bits 
for run, thus using up to 26 bits. In case of LDT coding, 
since the subregion size is 8x8,- the VLC table of the 
aforementioned earlier patent application that was in- 
corporated by reference, is employed for luminance and 
the VLC table of VM is employed for chrominance; both 
of these tables do not require any extensions. 

If extended quantization QuantX Method 2 is em- 
ployed, in GDT coding, since the vector size is 8x8, the 
VLC table of the previously incorporated by reference 
patent application is employed for luminance and the 
VLC table of VM is employed for chrominance; both of 
these tables do not require any extensions. In the case 
of LDT coding, a maximum run for luminance is 15 and 
that for chrominance is 3; in this case new tables which 
are subsets of the previously incorporated patent appli- 
cation are employed. 

If extended quantization QuantX Method 3 is em- 
ployed, VLC tables used are based on tables available 
in MPEG-4 core experiment T5 and are available pub- 
licly. 

Adaptive Dei nter leaved Transform Coding 

Further improvements in DT coding are possible by 
using an encoding structure as shown in FIG 16, which 
shows the block diagram of an Adaptive Global Deinter- 
leaved Transform (AGDT) encoder 160. The major dif- 
ference with respect to FIG 1 is that a quadtree segmen- 
tation is employed by using a Quadtree Segmenter on 
an entire picture or VOP basis prior to the deinterleaver 
and is adaptive rather than fixed segmentation. Thus, 
the deinterleaving is only performed on only the portions 
identified by Global Quadtree Segmenter to be worth 
deinterleaving while others are coded without deinter- 
leaving. The operation of other blocks is similar to that 
discussed for fixed GDT. 

Referring to FIG 16, the image is fed into the Global 
Quadtree Segmenter 161, whose output is passed to 
the deinterleaver 1 62, which in turn passes its output to 
the transform 163. The QuantX 164 receives the output 
from the transform 163 and passes is output to the en- 
tropy encoder 165, which outputs the coded bitstream. 

FIG 17 shows the block diagram of the AGDT de- 
coder 170 corresponding to the AGDT encoder 160 
shown in FIG 16. The coded bitstream is fed into the 
entropy decoder 171, the output of which is passed to 
the inverse QuantX 172, which in turn passes its output 
to the inverse transform 173. The reinterleaver 174 re- 
ceives the output from the inverse transform 173 and 
feeds its output to the Global Quadtree Assembler 175, 
which outputs the reconstructed image. 

FIG 1 8 shows a block diagram of an Adaptive Local 
Deinterleaved Transform (ALDT) encoder 180. The ma- 
jor difference with respect to FIG 16 is that the quadtree 
segmentation is applied locally (on regions) rather than 



the entire picture or VOP. Deinterleaving is then per- 
formed on the region identified by the Local Quadtree 
Segmenter as worth deinterleaving. The remaining 
blocks are similar to those described above. 

5 The image signal is input to the Local Quadtree 
Segmenter 1 81 , whose output is fed to the deinterleaver 
182, which passes its output to the transform 183. The 
QuantX 184 receives the output from the transform 1 83 
and passes its output to the entropy encoder 185, which 

'0 outputs the coded bitstream. 

FIG 19 shows the ALDT decoder 190 that corre- 
sponds to the encoder of FIG 1 8. The coded bits are fed 
into the entropy decoder 191, which passes its output 
to the inverse QuantX 1 92, which in turn passes its out- 

1$ put to the inverse transform 193, which in turn passes 
its output to the reinterleaver 194. The local quadtree 
assembler 195 receives the output of the reinterleaver 
1 94 and outputs the reconstructed image. 



As shown in FIGs 16 and 18, quadtree segmenta- 
tion is employed prior to deinterleaving to allow adapta- 
tion of the amount of deinterleaving to the spatial content 
25 of the picture being coded. 

An example of quadtree segmentation employed is 
shown in FIG 20; both GDT and LDT use this type of 
segmentation, the only difference being in the number 
of levels employed - GDT employs _ levels of segmen- 
30 tation, whereas LDT employs _ levels of segmentation. 
As shown in FIG 20, picture block 200 is segmented 
into subblocks 202-205. Subblock 203 is then further 
partitioned into sections 206-209. The remaining blocks 
were not segmented to indicate that this process only 
35 segments the necessary blocks. 

Syntax and Semantics for MPEG-4 

We now provide the necessary syntax and seman- 
40 tics needed for generating coded bitstreams using the 
present invention. The various classes referred to below 
correspond to the current syntax of MPEG-4 VM3.2 



VldeoSession Class 

45 

No changes are necessary for this class. 
VideoObject Class 
50 No changes are necessary for this class. 
VideoObject Layer Class 

No changes are necessary for this class. 

55 

VideoObject Plane Class 

Two new syntax elements are introduced in this 



20 Quadtree Segmenter 
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class as follows. 



region_size 
deinterleave_ratio 



These syntax elements are defined as follows. 
region_size 

This is a 3 bit code which specifies the size of the 
region on which deinterleaving is performed prior to cod- 
ing. The size of the region for each code is shown in 
Table 1 as follows: 

Table 1 



Code 


Meaning 


000 


16x16 


001 


32x32 


010 


64x64 


011 


128x128 


100 


reserved 


101 


reserved 


110 


reserved 


111 


full picture 



deinterleave_ratio 

This is the 3-bit code which specifies the amount of 
deinterleaving performed on the identified region prior 
to coding. The same deinterleaving ration is used both 
horizontally and vertically. The amount of deinterleaving 
for each code is shown in Table 2 as follows. 

Table 2 



Code | 


Meaning 


000 


I 1:1 


001 


| 2:1 


010 


4:1 


011 | 


8:1 


100 1 


16:1 


101 | 


reserved 


110 | 


reserved 


111 I 


reserved 



Region Class 

Data for each region consists of region header fol- 
lowed by subregion data. 



10 



15 



20 



25 



30 



35 



so 



Table 3 



Rtype 



Rquant 



Subregion data 



Rquant 

Rquant is a 3 bit quantizer that takes nonlinear val- 
ues bounded by 1 to 31 with the meaning as shown in 
Table 4. 

Table 4 



Code 


Qp 


000 


2 


001 


43 


010 


7 


011 


10 


100 


14 


101 


I 18 


110 


| 23 


111 


| 28 



Subregion Class 

The definition of subregion data is dependent on the 
QuantX method employed and is specified as follows: 
ForQuantX Method 1: 

Table 5 



Structure of Subregion Class for QuantX Method 1 


Cod_subreg 


Tcoefs_subreg 



Cod_subreg 



Cod-subreg is a 1 bit flag that identifies if there is 
40 any coded data (nonzero values) for that subregion. 



Tcoefs_subreg 

Tcoefs_subreg are differential quantized coeffi- 
45 cients of subregion. 
For QuantX Method 2: 

Table 6 



Structure of Subregion Class for QuantX Method 2 


Cod_vector 


Tcoefs_vector 



Cod_vector 

ss Cod-vector is a 1 bit flag that identifies if there is any 
coded data a subregion. 
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of transformed vectors using a first quantization 
level; 

(iii) performing an inverse transformation of the 
plurality of transformed vectors to form a plu- 

s rality of vectors; 

(iv) generating a plurality of quantized trans- 
form coefficients from the plurality of vectors; 
and 

(v) performing an inverse quantization of the 
io plurality of quantized transform coefficients to 

form the plurality of transform coefficients using 
a second quantization level, wherein said first 
quantization level is greater than said second 
quantization level. 

15 

5. The method according to claim 2, wherein the step 
b) of converting comprises the steps of: 

(i) performing a vector quantization indices re- 
20 ordering of the plurality of samples to form a 

plurality of quantized vectors; 

(ii) performing an inverse vector quantization of 
the plurality of quantized vectors to form a plu- 
rality of limited dimension vectors; 

25 (in) normalizing a dimension of the plurality of 

limited dimension vectors to form a plurality of 
vectors; and 

(iv) unformatting the plurality of vectors to form 
the plurality of transform coefficients. 



30 



6. The method according to claim 2, wherein the step 
d) of reinterleaving further comprises the steps of: 



Tcoefs_vector 

Tcoefs_vector refers to twice quantized coefficients 
of a vector. 

For QuantX Method 3: 



Claims 

1 . A method for coding an image comprising the steps 
of: 

a) deinterleaving the image to form a plurality 
of image subsets; 

b) transforming the plurality of image subsets 
into a plurality of transform coefficients: 

c) converting the plurality of transform coeffi- 
cients to a plurality of samples; and 

d) performing an entropy encoding of the plu- 
rality of samples to form an encoded bit stream. 

2. A method for decoding a bit stream representing an 
image that has been encoded, comprising the steps 
of: 

a) performing an entropy decoding of the bit 
stream to form a plurality of samples; 

b) converting the plurality of samples to a plu- 
rality of transform coefficients; 

c) performing an inverse transformation on the 
plurality of transform coefficients to form a plu- 
rality of image subsets; and 

d) reinterleaving the plurality of image subsets 
to form the image. 

3. The method according to claim 2, wherein the step 
b) of converting comprises the steps of: 



(i) reinterleaving the plurality of image subsets 
35 to form a plurality of segments; and 

(ii) assembling the plurality of segments formed 
by the reinterleaving step d) (i) into the image. 



(i) performing an inverse scanning of the plu- 
rality of samples to form a plurality of difference *o 
values; 

(ii) adding the plurality of difference values to a 
plurality of predicted values to form a plurality 
of quantized values; 

(iii) performing a DC and AC coefficient predic- <*5 
tion on the plurality of quantized values to form 

the plurality of predicted values; and 

(iv) performing an inverse quantization of the 
plurality of quantized values to form the plurality 

of transform coefficients. so 

4. The method according to claim 2, wherein the step 
b) of converting comprises the steps of: 

(i) performing an inverse scan of the plurality of 55 
samples to form a plurality of quantized values; 

(ii) performing an inverse quantization of the 
plurality of quantized values to form a plurality 



7. A method for coding an image comprising the steps 
of: 

a) segmenting the image into a plurality of local 
regions; 

b) deinterleaving the plurality of local regions to 
form a plurality of deinterleaved regions; 

c) transforming the plurality of deinterleaved re- 
gions into a plurality of transform coefficients; 

d) converting the plurality of transform coeffi- 
cients to a plurality of samples; and 

e) performing an entropy encoding of the plu- 
rality of samples to form an encoded bit stream. 

6. A method for decoding a bit stream representing an 
image that has been encoded, comprising the steps 
of: 

a) performing an entropy decoding of the bit 
stream to form a plurality of samples; 
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b) converting the plurality of samples to a plu- 
rality ot transform coefficients; 

c) performing an inverse transformation of the 
plurality of transform coefficients to form a plu- 
rality of deinterleaved regions; 

d) reinterleaving the plurality of deinterleaved 
regions to form a plurality of local regions; and 

e) assembling the plurality of local regions to 
form the image. 

9. The method according to claim 8, wherein the step 
b) of converting comprises the steps of: 

(i) performing an inverse scanning of the plu- 
rality of samples to form a plurality of difference 
values; 

(ii) adding the plurality of difference values to a 
plurality of predicted values to form a plurality 
of quantized values; 

(iii) performing a DC and AC coefficient predic- 
tion on the plurality of quantized values to form 
the plurality of predicted values; and 

(iv) performing an inverse quantization of the 
plurality of quantized values to form the plurality 
of transform coefficients. 

10. The method according to claim 8, wherein the step 
b) of converting comprises the steps of: 

(i) performing an inverse scan of the plurality of 
samples to form a plurality of quantized values; 

(ii) performing an inverse quantization of the 
plurality of quantized values to form a plurality 
of transformed vectors using a first quantization 
level; 

(iii) performing an inverse transformation of the 
plurality of transformed vectors to form a plu- 
rality of vectors; 

(iv) generating a plurality of quantized trans- 
form coefficients from the plurality of vectors; 
and 

(v) performing an inverse quantization of the 
plurality of quantized transform coefficients to 
form the plurality of transform coefficients using 
a second quantization level, wherein said first 
quantization level is greater than said second 
quantization level. 

11. The method according to claim 8, wherein the step 
b) of converting comprises the steps of: 

(i) performing a vector quantization indices re- 
ordering of the plurality of samples to form a 
plurality of quantized vectors; 

(ii) performing an inverse vector quantization of 
the plurality of quantized vectors to form a plu- 
rality of limited dimension vectors; 

(iii) normalizing a dimension of the plurality of 
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limited dimension vectors to form a plurality of 
vectors; and 

(iv) unformatting the plurality of vectors to form 
the plurality of transform coefficients. 

5 

12. The method according to claim 8, wherein the step 
d) of reinterleaving further comprises the steps of: 

(i) reinterleaving the plurality of deinterleaved 
*0 regions to form a plurality of local region seg- 
ments; and 

(ii) assembling the plurality of local region seg- 
ments to form the plurality of local regions. 

15 13. An apparatus for coding an image comprising: 

a) a deinterleaver receiving the image and dein- 
terleaving the image to form a plurality of image 
subsets; 

20 b) a transformer being coupled to the deinter- 

leaver and transforming the plurality of image 
subsets into a plurality of transform coefficients; 

c) a converter being coupled to the transformer 
and converting the plurality of transform coeffi- 

25 cients to a plurality of samples; and 

d) an encoder being coupled to the converter 
and performing an entropy encoding of the plu- 
rality of samples to form an encoded bit stream. 

30 14. An apparatus for decoding a bit stream represent- 
ing an image that has been encoded, comprising: 

a) a decoder receiving the bit stream and per- 
forming an entropy decoding of the bit stream 

35 to form a plurality of samples; 

b) a converter being coupled to the decoder and 
converting the plurality of samples to a plurality 
of transform coefficients; 

c) a reverse transformer being coupled to the 
to converter and performing an inverse transfor- 
mation of the plurality of transform coefficients 
to form a plurality of image subsets; and 

d) a reinterleaver being coupled to the reverse 
transformer and reinterleaving the plurality of 

45 image subsets to form the image. 

15. The apparatus according to claim 14, wherein the 
converter comprises an inverse extended quantiz- 
er, said inverse extended quantizer including: 

so 

a) an inverse scanner being coupled to the de- 
coder and performing an inverse scan on the 
plurality of samples to form a plurality of differ- 
ence values; 

55 b) an adder having a first input being coupled 

to the inverse scanner, having a second input, 
and having an output outputting a plurality of 
quantized values; 



10 



19 EP 0 833 517 A2 20 



c) a coefficient predictor having an input being 
coupled to the output of the adder, performing 
a DC and AC coefficient prediction on the plu- 
rality of quantized values to form a plurality of 
predicted values, and having an output being s 
coupled to the second input of the adder, 
wherein the plurality of quantized values equals 

a sum of the plurality of difference values and 
the plurality of predicted values; and 

d) an inverse quantizer being coupled to the io 
adder and inverse quantizing the plurality of 
quantized values to form the plurality of trans- 
form coefficients. 

16. The apparatus according to claim 14, wherein the is 
converter comprises an inverse extended quantiz- 
er, said inverse extended quantizer including: 

a) an inverse scanner being coupled to the de- 
coder and performing an inverse scan of the 20 
plurality of samples to form a plurality of quan- 
tized values; 

b) a first inverse quantizer being coupled to the 
inverse scanner and performing an inverse 
quantization of the plurality of quantized values 25 
to form a plurality of transformed vectors using 

a first quantization level; 

c) an inverse transformer being coupled to the 
inverse quantizer and performing an inverse 
transformation of the plurality of transformed 30 
vectors to form a plurality of vectors; 

d) a vector unformatter being coupled to the in- 
verse transformer and generating a plurality of 
quantized transform coefficients from the plu- 
rality of vectors; and 3S 

e) a second inverse quantizer being coupled to 
the vector unformatter and generating the plu- 
rality of transform coefficients from the plurality 
of quantized transform coefficients using a sec- 
ond quantization level, wherein the first quanti- 40 
zation level is greater than the second quanti- 
zation level. 

17. The apparatus according to claim 14, wherein the 
converter comprises an inverse extended quantiz- 45 
er, said inverse extended quantizer comprising: 

a) a vector quantization indices reorderer being 
coupled to the decoder and performing a vector 
quantization indices reordering of the plurality so 
of samples to form a plurality of quantized vec- 
tors; 

b) an inverse vector quantization being coupled 
to the vector quantization indices reorder and 
performing an inverse vector quantization of 55 
the plurality of quantized vectors to form a plu- 
rality of limited dimension vectors; 

c) a dimension normalizer being coupled to the 



inverse vector quantization and normalizing a 
dimension of the plurality of limited dimension 
vectors to form a plurality of vectors; and 
d) a vector unformatter being coupled to the di- 
mension normalizer and unformatting the plu- 
rality of vectors to form the plurality of transform 
coefficients. 

18. The apparatus according to claim 14, further com- 
prising: 

a) a global quadtree assembler being coupled 
to the reinterleaver, wherein the reinterleaver 
reinterleaves the plurality of image subsets to 
form a plurality of segments, and the global 
quadtree assembler assembles the plurality of 
segments formed by the reinterleaver into the 
image. 

19. A system for coding an image comprising: 

a) a segmenter receiving the image and seg- 
menting the image into a plurality of local re- 
gions; 

b) a deinterleaver being coupled to the seg- 
menter and deinterleaving the plurality of local 
regions to form a plurality of local region sub- 
sets; 

c) a transformer being coupled to the deinter- 
leaver and transforming the plurality of local re- 
gion subsets into a plurality of transform coef- 
ficients; 

d) a converter being coupled to the transformer 
and converting the plurality of transform coeffi- 
cients to a plurality of samples; and 

e) an encoder being coupled to the converter 
and performing an entropy encoding of the plu- 
rality of samples to form an encoded bit stream. 

20. A system for decoding a bit stream representing an 
image that has been encoded, comprising: 

a) a decoder receiving the bit stream and per- 
forming an entropy decoding of the bit stream 
to form a plurality of samples; 

b) a converter being coupled to the decoder and 
converting the plurality of samples to a plurality 
of transform coefficients; 

c) a reverse transformer being coupled to the 
converter and performing an inverse transfor- 
mation of the plurality of transform coefficients 
to form a plurality of local region subsets; 

d) a reinterleaver being coupled to the reverse 
transformer and reinterleaving the plurality of 
local region subsets to form a plurality of local 
regions; and 

e) an assembler being coupled to the reinter- 
leaver and assembling the plurality of local re- 
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gions to form the image. 
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